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Abstract — This paper examines the effectiveness of a sparse 
Bayesian algorithm to estimate multivariate autoregressive coef- 
ficients when a large amount of background interference exists. 
This paper employs computer experiments to compare two 
methods in the source-space causality analysis: the conventional 
least-squares method and a sparse Bayesian method. Results 
of our computer experiments show that the interference affects 
the least-squares method in a very severe manner. It produces 
large false-positive results, unless the signal-to-interference ratio 
is very high. On the other hand, the sparse Bayesian method 
is relatively insensitive to the existence of interference. However, 
this robustness of the sparse Bayesian method is attained on 
the scarifies of the detectability of true causal relationship. Our 
experiments also show that the surrogate data bootstrapping 
method tends to give a statistical threshold that are too low 
for the sparse method. The permutation-test-based method gives 
a higher (more conservative) threshold and it should be used 
with the sparse Bayesian method whenever the control period is 
available. 

I. Introduction 

Estimating a causal relationship among cortical activities 
using the MEG/EEG source space analysis has gained a great 
interestfl], 0. Such causality analysis generally requires a 
two step procedure: The first step estimates source time series 
from MEG/EEG measurements. The second step computes 
some types of causality measures using the estimated time 
series of target source activities. Here, popular measures are 
Granger-causality -based measures. Many investigations have 
been performed in the past fifteen years to explore the ef- 
fectiveness of the Granger-causality-based measures in brain 
signal analysis[3], [4|, J5|. 

The Granger-causality measures rely on the accurate model- 
ing of the multivariate vector auto-regressive (MVAR) process 
of the source time series. Since, in general, the causality 
analysis is performed using the source time series estimated 
from non-averaged data, the estimated time series inevitably 
contains large influence of brain background interference, 
which is often referred to as the brain noise. However, the 
MVAR modeling in general does not take such interference 
into account, their existence may cause significant amount of 



errors in the estimated MVAR coefficients, leading to a mis- 
estimation of completely wrong causality relationships. 

One approach to reduce the errors in the MVAR estimation 
caused due to the background interference is to impose the 
sparsity constraint when estimating the MVAR coefficients. 
The key assumption here is that true brain interaction causes 
small number of MVAR coefficients to have non-zero values, 
and most of MVAR coefficients remain to be zero. If this 
is true, the sparsity constraint imposed when estimating the 
MVAR coefficients should be able to prevent most MVAR co- 
efficients to have erroneously large values due to the influence 
of background interference. 

This paper tests the effectiveness of a method that im- 
poses the sparsity on its solution. This paper uses computer 
experiments to compare two methods for MVAR coefficient 
estimation: the conventional least-squares method and a sparse 
Bayesian method that imposes the sparsity on its solution. In 
Section |IlJ the two methods are briefly described. We present 
the results of our computer experiments to compare these two 
methods in Section UIT1 

II. Method 
A. Estimation of MVAR coefficients 

1) Least-squares-based algorithm: We assume that total K 
multiple time series exist and we model these multiple time 
series using the multivariate vector auto-regressive (MVAR) 
process. Let us denote the kt\\ time series as Sk(t), where k = 
1,...,K, and define a column vector s(t) such that s(t) = 
[s±(t), . . . , sk (t)] T ■ The time t is expressed using a unit-less 
value. This s(t) is expressed as MVAR process, such that 

p 

s(t)=J2A(p)s(t-p) + e(t), (1) 
p=i 

where A(p) is the pth coefficient matrix, P is the model order, 
and e(t) is the residual vector. 

We can estimate the MVAR coefficients Ai,j(p) where 
i, j = 1, . . . , K and p = 1, . . . , P based on the least-squares 



principle. To derive the least-squares equation, let us explicitly 
write the MVAR process for the fcth component Sk(t) as 



(t) = Y t A k j{i)8 j (t-i) 
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+ ■■■ +J2A k>j (P) Sj (t - P) + e k (t). (2) 
4=1 

We assume that the source time series are obtained at t = 
1, . . . , N T where N T > K x P. Then, since Equation Q 
holds for t = P+l, . . . , Nt, a total of Nt~P linear equations 
are obtained by setting t = P + 1, . . . , Nt in Eq. 
These equations are formulated as a matrix form, 

y k =&x k + e k . (3) 

Here, the (Nt — P) x 1 column vector y k is defined as 

y fc = MP + 1), s*(P + 2), . . . , Sfe (7V T )] T , (4) 

where the superscript T indicates the matrix transpose. In 
Eq. l(3), # is an (Nt~P) x PA' matrix whose (i, j')th element 
<?i j:) - is given by 



where 
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(5) 



(6) 



and [•] indicates the integer not greater than the value in 
parentheses. The column vector x k is expressed as 

x k = [A kil (l), . . -,A kjK (l), . . .,A kA (P), A KK (P)] T . 

(7) 

The residual vector e k is given by 



e k = [e k (P + l),...,e k (N T )) 



(8) 



Equation Q is called the Yule-Walker equation. The least- 
squares estimate of x k , x k , is then obtained using, 



x k = (# T #)- 1 # T y fc . 



(9) 



2) Sparse Bayesian algorithm: We introduce an algorithm 
that imposes the sparsity on its solution. One powerful algo- 
rithm is based on the Bayesian statistics applied to solving the 
linear equation in Eq. ([3J. In this algorithm, the prior proba- 
bility distribution of x k , f(x k ), is assumed to be Gaussian: 



f(x k )=M(x k \0,v), 



(10) 



where is a column vector whose elements are all zero, v 
is a diagonal precision matrix. The probability distribution of 
y k given x k , f(y k \x k ), is also assumed to be Gaussian: 



f(y k \x k ) =JV(y k \&x k ,A), 



(11) 



where A is a diagonal noise precision matrix. Then, the pos- 
terior distribution of x k , f(x k \y k ), is shown to be Gaussian, 
and it is expressed as 



f(x k \y k ) = Af(x k \x k ,r). 



(12) 



The estimation of x k and P is carried out by using the 
well-known expectation-maximization (EM) algorithm[7|. The 
update rules in the E step of the EM algorithm give the 
estimates of those parameters, such that 
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(13) 
(14) 



The parameters, v and A, are estimated in the M step of 
the EM algorithm. The diagonal components of v are obtained 
using 



1 



where 



and 
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j indicates the (j,j)th (diagonal) element of a matrix 
in parentheses. The noise precision matrix A is obtained using 



A 1 = diag 
where 
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and R xy = 



x k yj, 



(17) 



(18) 



and diag[-] indicates a diagonal matrix whose diagonal el- 
ements are equal to those of a matrix in parentheses. The 
estimate of the MVAR coefficients is given from x k , after 
the EM iteration is finished. This algorithm is similar to but 
considerably simpler than the one proposed in J6). 

B. Partial directed coherence 

Once the MVAR coefficient matrices are obtained, Granger- 
causality-based measures can be computed. In this paper, we 
use the partial directed coherence (PDC) proposed in J4j. To 
derivce PDC, we first compute 

p 

A(f) = I-Y,Mp)e- 2mpf , (19) 

P =i 

where I is the identity matrix with its size equal to the size 
of the coefficient matrix A(p). Then, the PDC from the fcth 
source to the jth source, ^ k ^j, can be computed using 

\A 3 Af)\ 



a k a k 



(20) 



where a k represents the fcth column of the matrix A(f), and 
the superscript H indicates the Hermitian transpose. 

In actual measurements of brain signals, multiple trials 
(realizations) are usually measured. In such cases, the above 
A(f) is computed from each trial, and the average of each 
A(f), (A(f)), is then obtained where (■) indicates the average 
across trials. This (A(f)) is used to derive PDC in Eq. (20i, 
such that, 

lfe(/))l 



(21) 



In our computer experiments, Eq. (21 
PDC. 



is used to compute 



C. Statistical thresholding 

1) Surrogate-data bootstrapping: In this paper, two types 
of statistical thresholding methods are tested: the surrogate- 
data bootstrapping [8 1 and the permutation test[9|. The 
surrogate-data bootstrapping is widely used in the MVAR- 
based causal analysis. In this method, the Fourier transform 
of the time series Sj(t) is first computed; it is denoted (Jj(fk) 
where f k indicates the fcth frequency bin. We multiply a 
random phase to <Jj(fk) to create o~j{fk) exp(— is/.) where 
£k is a uniform random number distributed between — 7r 
and 7T. The phase-modulated spectrum o-j(fk) exp(— iek) is 
then inverse Fourier transformed to create s*(t), which is 
called the surrogate time series. This procedure is repeated for 
j = 1,...,K with generating a new random number to Ef.. 
The resultant surrogate data set s* (t) where j = 1, . . . , K does 
not anymore contain causal relationships, although it maintains 
the same power spectra as those of the original time series. 

Since there are (infinitely) many ways of generating random 
phases, many different surrogate data sets can be generated. 
Let us assume that we generate total Nb surrogate data sets, 
and compute Nb different PDC results using these surrogate 
data sets. We denote these PDC results as ^^{fk) where (3 — 
1,...,Nb^\ 

We may derive the statistical threshold using the null distri- 
bution formed using $>"(fk) (fi = 1, . . . ,B). However, such 
a statistical threshold does not take the multiple comparisons 
into account. To derive a statistical threshold that takes the 
multiple comparisons into consideration, the values ^(/fc) 
(/? — 1, . . . , B) are standardized, such that 

where ^^{fk) and cr B (fk) are the average and the variance of 
* /3 (/ fe ). The maximum value of T l3 (f k ) is denoted T max (f k ). 
The null distribution for deriving the statistical threshold is 
formed using T max {fk) from all frequency bins. That is, 
denoting a total number of the frequency bins Nf, we sort 
Tmaxifk) such that 



(22) 



(23) 



Here, T max (j) is the jth smallest value of T max (f k ). Setting 
the significance level as a, which is the probability of occur- 
ring the type I error, the threshold value T^ ax is determined 
as T max (cu) where u = (1 — ct)Nf. The threshold for each 
frequency bin f k , ® t h(fk), is derived as 



%h(fk)=T t ± x a B (f k ) + ^(f k ). 



(24) 



This ^th(fk) is the statistical threshold that takes the multiple 
comparisons into account. 

2) Permutation-test based thresholding: In our numerical 
experiments, we introduce another method to derive the sta- 
tistical threshold, which is based on the permutation test. To 
implement the permutation-test-based method, a prerequisite 



is that the control data^jis available, and the method computes 
the MVAR coefficient also from the control data. Let us 
denote the MVAR coefficient computed from the mth-trial 



task data as A\ . ' (p) and that from the mth-trial control data 
as c Aij {p). Denoting the total number of trials as N e , the 
N e /2 coefficients are randomly chosen from A^{p) where 
m = 1, . . . , N e and another N e /2 coefficients are randomly 
chosen from c Af^'{jp) where m = 1, . . . , N e . These total N e 
coefficients are averaged, and these mean MVAR coefficients 
are used to compute PDC with Eq. pT) . Since many number 
of the way to choose A^"\p) and c A^™\p), many number 
of mean MVAR coefficients can be obtained. Assuming that 
we have a total of Nb mean MVAR coefficients, we can 
use Nb different PDC values to form a null distribution. 
Therefore, using exactly the same procedure as described in 
Section II-C1 we can derive the statistical threshold ^th{fk)- 



III. Computer Experiments 

A. Generation of simulated MEG recordings 

We perform numerical experiments to compare the perfor- 
mances of the algorithms described in Section [IT] In these 
experiments, we apply the source-space causality analysis 
in which source time series are first estimated from MEG 
recordings, and the MVAR coefficients are then computed 
using the source time series at selected voxels. The partial 
directed coherence (PDC) is computed to assess the causality 
relationship. 

Here, we use a sensor alignment of the 275 whole-head 
MEG sensor array from Omega™ (VMS Medtech, Coquit- 
lam, Canada) neuromagnetometer. Three sources are assumed 
to exist on the vertical single plane: (x = cm). The source- 
sensor configuration and the coordinate system are depicted 
in Fig. 1(a). We assume three sources and the time series of 
these three sources are denoted s±(t), S2(t), and ss(t). We 
simulate two kinds of scenarios: interacting-source and non- 
interacting-source scenarios. In the interacting source scenario, 
the source activities are assumed to have causal relationships, 
and the time series of the three sources are generated using 
the MVAR process reported in ifTOl . which uses 
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This MVAR process represents the causal relationship depicted 
in Fig. 1(a), in which the second source has a directional 
causal influence on the third source and the third source has 
the directional influence on the first source. The true PDC 



is computed using the model MVAR coefficients in Eq. ( 25 1 



'Here, we omit the explicit notation of k — > j from the notation ^^(/fc). noise. 



2 The control data indicates the data containing only interference and sensor 
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Fig. 2. Results of computing partial directed coherence (PDC) for interacting 
source scenario. The least-squares (LS) algorithm is used. (a)Results when 
SIR equal to 2. (b)Results when SIR equal to 1. (c)Results when SIR equal 
to 0.5. (d)Results when SIR equal to 0.25. The solid lines show PDC and 
the broken lines show the statistical threshold for the 95%-significance level 
obtained with the surrogate-data bootstrap method. 
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Fig. 1. The coordinate system and source-sensor configuration used in 
the numerical experiments. The plane at x = cm is shown. The small 
circles show the locations of the three sources, and the bold arrows indicate 
their causal relationships assumed in the experiments, (b) The plot of partial 
directed coherence (PDC) computed using the model MVAR coefficients in 
Eq. J25l. The ordinate of these plots indicates the frequency normalized by the 
sampling frequency in which the value 0.5 indicates the Nyquist frequency. 
The abscissa is the PDC value, which is normalized to 1 . 
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The results are shown in Fig. 1(b). These plots of PDC is the 
ground truth for the following experiments. 

In the non-interacting source scenario, the source activities 
are assumed to have no causal relationships, and the time 
series of the three sources are generated as low-pass-filtered, 
independent random time series. Here, Si(t) is generated by 
applying low-pass filter with the cut-off frequency of 0.3 to 
a white Gaussian random time series. The time series of the 
second source, S2(t), is generated by applying the low-pass 
filter with the cutoff frequency of 0.2 to a white Gaussian 
random time series, and S3(t) is generated by applying the 
low -pass filter with the cutoff equal to 0.15 to a white Gaussian 



Fig. 3. Results of computing partial directed coherence (PDC) for interacting 
source scenario. The sparse Bayesian algorithm is used. (a)Results when SIR 
equal to 2. (b)Results when SIR equal to 1. (c)Results when SIR equal to 0.5 
(d)Results when SIR equal to 0.25. The solid lines show PDC and the broken 
lines show the statistical threshold for the 95%-significance level obtained 
with the surrogate-data bootstrap method. 



random time series. (The filter cut-off frequency is expressed 
by the the frequency normalized by the sampling frequency.) 
Note that, since the three white Gaussian random time series 
are independent, these source time series have no causal 
relationships. 

In our computer simulation, we first generated the three- 
source time series, Si(i), S2(i), and ss(t), and then computed 



the signal magnetic recordings bg(t) using the spherical 
homogeneous conductor model] 11|. The simulated sensor 
recordings b(t) were generated by adding spontaneous MEG 
signal to bs(t), such that, b(t) — bs(t) +ab](t), where bj(t) 
is the spontaneous MEG measured using the same 275 whole- 
head sensor array, and a is a constant that controls the signal- 
to-interference ratio(SIR) of the generated sensor recordings. 
We generated forty-trial recordings; each consisting of 600 
time points with SIR equal to 2, 1, 0.5, and 0.25. Note that, 
in general, causality analysis is performed using non-averaged 
trial data, the conditions with SIR equal to 1 and 2 should 
be considered high SIR conditions. The conditions with SIR 
below 0.5 are considered practical conditions for non-averaged 
measurements. 

B. Source-space causality analysis 

The source reconstruction was performed using the recently- 
proposed algorithm, called Champagne. Since the detail of this 
algorithm has already been published] 12], it is not described 
here. Champagne algorithm was applied to the simulated 
sensor recordings b(t), and reconstructed time series 
S2^)> an d were obtained as the time series at vox- 

els nearest to the assumed source locations. Here, three- 
dimensional reconstruction was performed on a region defined 
as -4 < x < 4, -4 < x < 4, and 6 < z < 12 cm with a 
voxel interval equal to 0.5 cm. 

Once source time series are estimated, we can proceed with 
the MVAR coefficient estimation using these eatimated time 
series, si(i), 3~2(*)> an d sbC*)- The MVAR coefficients were 
estimated by using the methods described in Section [II] Here, 
the MVAR coefficient estimate Xk was obtained from each 
trial and, since forty trials were generated, forty sets of x~k was 
obtained. The final estimates of the MVAR coefficients were 
obtained by averaging these forty sets of Xk- These averaged 
MVAR coefficients were then used to compute PDC. 

C. Results for interacting source scenario 

Results for interacting source scenario are shown in Figs. [2] 
and [3] According to the results in Fig. [2] the least-squares 
method gives fairly accurate results only when SIR is equal 
to 2. However, in other SIR cases, very large spurious causal 
relationships exist in any directions. Although some spurious 
relationships may be removed by the statistical thresholding, 
this is not always the case. For example, when SIR is equal to 
1 and 0.5 (Fig. |2jb) and (c)), the causal relationship for the "2 
to 1" direction exceeds the threshold, and cannot be removed 
by the statistical thresholding. 

According to the results from the sparse algorithm in Fig. [5] 
when SIR is equal to 2 or 1, the resultant PDC is almost 
identical to the ideal results in Fig. 1(b). However, when SIR 
is equal to 0.5, small amounts of spurious PDC arises, for 
example, in the "2 to 1" direction; this spurious PDC exceeds 
the statistical threshold. When SIR is 0.25, spurious PDC 
arises in such directions as "2 to 1", "1 to 2", and "3 to 2". 
Also, PDC for the "2 to 3" direction becomes significantly 
weaker than the ideal results in Fig. 1(b), indicating that the 



detectability of the true causal relationship decreases due to 
the low SIR condition. 

D. Results for non-interacting source scenario 

The results of computing PDC for non-interacting source 
scenario are shown in Fig. [4] Since in these numerical experi- 
ments, no causal relationship is assumed among the activities 
of the three sources, the PDC for all the six directions must be 
completely zero. However, the results from the least-squares 
method show a large amount of spurious PDC and the spurious 
PDC exceeds, in some cases, the statistical threshold. This 
happens even when SIR is considerably high such as SIR 
equal to one. On the other hand, in the Bayesian results, PDC 
is almost completely equal to zero when SIR is equal to 1, 
but when SIR is equal to 0.25, a small amounts of non-zero 
spurious PDC exists that are sometimes exceeds the statistical 
threshold. 

E. Results using permutation-test based thresholding 

Next, the statistical threshold obtained using the permuta- 
tion test are shown in Figs. [5] and [6] we can see that the 
statistical threshold obtained using the permutation test is 
more conservative than the threshold from the surrogate-data 
bootstrapping. In Fig. |5|d), the threshold is as high as the 
spurious PDC in the "1 to 2" and "2 to 1" directions, and the 
spurious PDC can be thresholded out in these directions. On 
the other hand, the PDC in the "3 to 1" and "2 to 3" directions 
is also thresholded out, even though true interactions exist in 
these directions. 

On the other hand, in Fig. [6] the permutation-test-based 
method can threshold out the spurious PDC in all directions. 
The permutation-test based thresholding can remove the spuri- 
ous interactions, and it is particularly effective when used with 
the sparse Bayesian algorithm, since it increases the protection 
against having false positive results, although this is attained 
on the sacrifices of the detectability to true interactions. 

IV. Summary 

Results of our computer experiments show that the interfer- 
ence affects the least-squares method in a very severe manner. 
It produces large false-positive results, and the least-squares 
method easily suffers from spurious causal relationships unless 
the signal-to-interference ratio is very high, such as SIR of 2. 
On the other hand, the sparse Bayesian method is relatively 
insensitive to the existence of interference. However, the 
robustness of the sparse Bayesian method to false positive 
results is attained on the scarifies of the detectability of true 
causal relationship. When SIR is very low, the sparse Bayesian 
method may fail to detect the true causal relationships, al- 
though the spurious causal relationship is small. 

The surrogate data bootstrapping method tends to give a sta- 
tistical threshold that are too liberal for the sparse method. This 
is probably because the sparsity constraint works effectively 
for the surrogate data, and PDC computed using the surrogate 
data set tends to be very small, and as a result of this, the 
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Fig. 4. Results of computing partial directed coherence (PDC) for non- 
interacting source scenario. (a)Results from the least-squares algorithm when 
SIR equal to 1. (b)Results from the least-squares algorithm when SIR equal 
to 0.25. (c)Results from the Bayesian algorithm when SIR equal to 1. (d) 
Results from the Bayesian algorithm when SIR equal to 0.25. The solid lines 
show PDC and the broken lines show the statistical threshold for the 95%- 
significance level obtained with the surrogate-data bootstrap method. 



Fig. 6. Results of computing partial directed coherence (PDC) for non- 
interacting source scenario with the statistical thresholding obtained using the 
permutation test. (a)Results from the least-squares algorithm when SIR equal 
to 1. (b)Results from the least-squares algorithm when SIR equal to 0.25. 
(c)Results from the Bayesian algorithm when SIR equal to 1. (d) Results from 
the Bayesian algorithm when SIR equal to 0.25. The solid lines show PDC 
and the broken lines show the statistical threshold for the 95%-significance 
level obtained with the permutation-test based method. 
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Fig. 5. Results of computing partial directed coherence (PDC) for interacting 
source scenario with the statistical thresholding obtained using the permutation 
test. (a)Results from the least-squares algorithm when SIR equal to 1. 
(b)Results from the least-squares algorithm when SIR equal to 0.25. (c)Results 
from the Bayesian algorithm when SIR equal to 1. (d) Results from the 
Bayesian algorithm when SIR equal to 0.25. The solid lines show PDC and 
the broken lines show the statistical threshold for the 95%-significance level 
obtained with the permutation-test based method. 



statistical threshold becomes very low. The permutation-test- 
based method gives a higher (more conservative) threshold and 
this method is better be used with the sparse Bayesian method 
whenever the control period is available. 
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